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Abstract:  This  paper  discusses  the  architecture  and  provides  performance 
studies  of  a  silicon  photonic  chip-scale  optical  switch  for  scalable 
interconnect  network  in  high  performance  computing  systems.  The 
proposed  switch  exploits  optical  wavelength  parallelism  and  wavelength 
routing  characteristics  of  an  Arrayed  Waveguide  Grating  Router  (AWGR) 
to  allow  contention  resolution  in  the  wavelength  domain.  Simulation  results 
from  a  cycle-accurate  network  simulator  indicate  that,  even  with  only  two 
transmitter/receiver  pairs  per  node,  the  switch  exhibits  lower  end-to-end 
latency  and  higher  throughput  at  high  (>90%)  input  loads  compared  with 
electronic  switches.  On  the  device  integration  level,  we  propose  to  integrate 
all  the  components  (ring  modulators,  photodetectors  and  AWGR)  on  a 
CMOS -compatible  silicon  photonic  platform  to  ensure  a  compact,  energy 
efficient  and  cost-effective  device.  We  successfully  demonstrate  proof-of- 
concept  routing  functions  on  an  8  x  8  prototype  fabricated  using  foundry 
services  provided  by  OpSIS-IME. 
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1.  Introduction 

Scalable,  low  latency,  and  high-throughput  interconnection  is  essential  for  future  high 
performance  computing  (HPC)  applications  [1].  Interconnect  networks  based  on  electronic 
multistage  topologies  (e.g.  Fat-Tree,  CLOS,  Torus,  Flattened  Butterfly  [2,  3])  result  in  large 
latencies,  due  to  the  multi-hop  nature  of  these  networks  and  high  power  consumption  in  the 
buffers  and  the  switch  fabric.  It  is  increasingly  difficult  to  meet  high  bandwidth  and  low 
latency  communications  using  conventional  electrical  switches.  On  the  other  hand,  integrated 
optics  may  enable  the  continued  scaling  of  capacity  required  by  future  HPC  systems.  Silicon 
photonics  is  now  the  most  active  discipline  within  the  field  of  integrated  optics  due  to  its 
compatibility  with  the  mature  silicon  IC  manufacturing.  Other  motivations  include  the 
availability  of  high  quality  high  index  contrast  silicon-on-insulator  (SOI)  wafer  to  enable  the 
scaling  of  photonic  devices  to  the  hundreds  of  nanometer  level  and  excellent  material 
properties  such  as  high  thermal  conductivity,  high  optical  damage  threshold  and  high  optical 
nonlinearities  [4].  Recent  advances  in  key  components,  such  as  high-port-count  low-loss 
silicon  AWG  [5]  and  AWGR  [6],  Si  ring  modulators  [7],  high-responsivity  epitaxial 
Germanium  (Ge)  photodetectors  (PD)  [8],  hybrid  semiconductor  optical  amplifiers  (SOA)  [9] 
and  laser  sources  [10],  are  paving  the  way  for  a  disruptive  step  in  device  integration  for  large 
chip-scale  optical  switch  systems. 

Among  all  the  proposed  and  existing  optical  interconnect  architectures  for  HPC  and 
datacenters,  AWGR  based  solutions  have  drawn  strong  attention  due  to  its  dense 
interconnectivity  and  unique  wavelength  routing  capability.  For  example  LIONS,  (previously 
named  as  DOS)  [11],  Petabit  [12],  and  IRIS  [13]  are  all  based  on  AWGR  and  Tunable 
Wavelength  Converters  (TWC).  They  benefit  from  the  high  capacity  offered  by  Wavelength 
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Division  Multiplexing  (WDM).  Furthermore,  multiple  WDM  channels  on  one  output  can  be 
used  as  multiple  concurrent  channels  to  avoid  head-of-line  blocking  [14],  which  results  in 
lower  latency  and  higher  throughput.  In  particular,  LIONS  uses  single  fixed  wavelength 
transmitter  per  node  with  SOA-MZI  based  tunable  wavelength  converters  placed  at  AWGR 
inputs  to  route  the  traffic  to  the  desired  AWGR  output  ports.  The  \-by-k  DEMUX  and  k 
parallel  receivers  at  each  output  node  accommodates  up  to  k  concurrent  packets  using  k 
different  wavelengths,  which  greatly  reduces  the  contention  probability  and  the  average  end- 
to-end  latency.  The  contented  packets  enter  an  electrical  shared  loopback  buffer  and  re-enter 
the  AWGR  through  a  dedicated  AWGR  input/output  port  pair.  A  centralized  electrical  control 
plane  handles  all  the  contention  and  packet  retransmission  [11].  Note  that,  the  above  LION 
switch  architecture  was  designed  for  rack-to-rack  or  cluster  to  cluster  application,  while  this 
paper  discusses  new  AWGR-based  switch  architecture  for  on-chip  communication  in  HPC 
systems.  In  this  case,  thanks  to  the  very  short  distance  between  nodes  and  switch,  the  nodes 
(processors)  communicate  with  the  centralized  controller  directly  in  the  electrical  domain,  and 
the  packets,  stored  in  the  input  queues,  are  transmitted  only  upon  the  node  requests  are 
acknowledged  and  the  grants  are  received.  So,  this  on-chip  architecture  does  not  require  any 
electrical  loopback  buffer  and  wavelength  converters  at  the  AWGR  inputs.  The  tunable  lasers 
can  be  used  directly  at  the  node  TXs.  In  particular,  multiple  TXs  per  node  can  be  used  to  form 
multiple  transmitter/receiver  pairs  on  each  connecting  nodes.  Simulation  results  show  that, 
even  with  only  two  transmitter/receiver  pairs,  end-to-end  latency  and  throughput  is 
significantly  improved  compared  to  its  electrical  counterpart  at  high  (>90%)  input  load.  In 
addition,  we  observe  zero  packet  loss  even  at  100%  input  load  under  the  simulated  scenarios. 
In  terms  of  photonic  device  implementation,  we  propose  to  use  silicon  ring  modulators  with  a 
broadcasted  optical  comb  source  to  replace  the  SOA-MZI  TWCs  on  the  transmitter  side,  and 
use  ring  resonators  as  DEMUXs  on  the  receiver  side.  The  main  building  blocks  (AWGRs, 
ring  resonators  and  Ge  PDs)  are  all  available  on  the  Silicon-On-Insulator  (SOI)  platform, 
which  results  in  a  compact  and  cost-effective  device.  Finally,  we  present  a  prototype  based  on 
8  x  8  200-GHz  spaced  AWGR  with  four  transmitter/receiver  pairs  on  each  node.  The 
footprint  of  the  fabricated  device  is  1.2  mm  by  2.4  mm  using  standard  microelectronic 
foundry  service  offered  by  OpSIS-IME  [15]. 

We  organize  the  remainder  of  the  paper  as  follows:  Section  2  describes  the  proposed 
scalable  optical  interconnect  architecture  based  on  AWGR  with  ring  resonators  and  the  Ge 
photodetectors  on  a  SOI  platform.  Section  3  presents  the  performance  study  of  the  proposed 
switch  using  a  clock-cycle-accurate  architecture-level  simulator.  Section  4  describes  the 
design  of  the  8  x  8  switch  prototype  and  presents  experimental  demonstrations  of  successful 
routing  functions  on  the  fabricated  device  by  OpSIS-IME.  Section  5  concludes  the  paper. 

2.  AWGR  based  optical  interconnects  with  multiple  transmitters/receivers  at  each  node 

Wavelength  division  multiplexing  (WDM)  technology  allows  for  the  frequency  domain 
parallelism.  Meanwhile,  AWGR  allows  for  the  multiplexed  wavelengths  in  the  waveguides  to 
be  separated  and  cross-connected.  As  shown  in  Fig.  1 ,  N  nodes  (N  =  8  in  this  example) 
respectively  connected  to  the  N  input  ports  of  an  AWGR  can  use  N  wavelengths  to  reach 
different  output  ports  simultaneously  without  interfering  with  each  other.  The  cyclic 
frequency  feature  [16]  guarantees  the  same  set  of  wavelengths  can  be  used  at  each  node.  In 
principle,  the  single  passive  AWGR-based  all-to-all  interconnections  of  N  nodes  in  a  star 
topology  provides  the  densest  communication  pattern  that  can  be  implemented  in  a  computing 
network  provided  that  an  N  x  1  optical  multiplexer  (1  x  N  de-multiplexer)  and  N  transmitters 
(receivers)  are  available  for  each  AWGR  input  (output)  ports.  This  configuration  requires  N2 
transmitter/receiver  pairs  in  total,  which  does  not  scale.  Alternatively,  AWGR  with  fixed  kt 
(j kt<N)  transmitters  and  kr  ( kr<N)  receivers  at  each  interconnecting  node  can  potentially 
provide  a  scalable  and  efficient  solution  [11]. 
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Fig.  1.  (a)  The  routing  property  and  (b)  the  routing  table  of  an  8  x  8  cyclic-frequency  AWGR. 
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Fig.  2.  The  proposed  interconnect  architecture  for  chip-scale  high  performance  computing. 

Figure  2  shows  the  proposed  chip-scale  interconnect  architecture.  We  assume  a 
centralized  control  plane  here  for  simplicity,  which  can  be  potentially  replaced  by  a 
distributed  one  using  the  all-optical  TOKEN  technique  [17].  Each  node  has  a  transmitter  array 
that  uses  kt  ring  modulators  to  generate  the  data  packets.  An  off-chip  comb  generator  provides 
the  N  wavelengths  required  by  the  cyclic  frequency  AWGR  for  wavelength  routing.  We  route 
the  waveguides  in  such  a  way  that  only  selected  comb  lines  enter  the  AWGR  to  avoid 
crosstalk  from  the  unused  ones.  Another  solution  is  to  assign  a  fast  tunable  laser  for  each  ring 
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modulator.  We  choose  silicon  ring  modulator  due  to  its  compactness,  wavelength  selectivity 
and  energy  efficiency  [7].  The  rings  at  the  input  side  serve  as  both  optical  modulator  and 
wavelength  MUX.  As  a  result,  they  have  two  sets  of  the  control  electrodes.  The  low  speed 
pads  are  for  aligning  the  ring  resonances  with  the  AWGR  passbands  while  we  apply  high¬ 
speed  RF  signals  on  the  Ground- Signal  or  Ground- Signal-Ground  pads  for  data  modulation. 
Only  low  speed  pads  are  required  for  the  DEMUX  rings  on  the  output  bus  waveguides.  The 
receiver  reads  the  information  on  the  de-multiplexed  wavelength  after  Optical-to-Electrical 
(O/E)  conversion.  Since  we  have  multiple  rings  on  both  input  and  output  bus  waveguides,  one 
node  can  communicate  with  multiple  other  nodes  simultaneously.  We  restrict  the  number  of 
rings  on  each  bus  waveguide  to  a  fix  number  kt  and  kr>.  Other  than  the  drivers  and  buffers  for 
the  modulators  and  the  detectors,  all  the  electronic  components  can  work  at  a  much  lower 
speed  than  the  line  rate,  so  the  proposed  switch  can  be  potentially  more  power  efficient  than 
the  conventional  electronic  switches. 

The  scalability  of  the  proposed  switch  depends  on  the  scalability  of  AWGR.  In  theory, 
fabrication  of  large-port-count  AWGRs  is  possible,  but  limiting  factors,  such  as  difficulties  in 
accurate  wavelength  registration  and  high  crosstalk  due  to  dense  channel  spacing  and  large 
number  of  channel,  prevent  such  system  from  being  deployed  in  a  large  scale.  However,  it  is 
feasible  to  use  small  AWGRs  with  a  fewer  number  of  wavelengths  while  supporting  the  same 
connectivity  as  large-port-count  AWGR  [18].  Interconnecting  N  nodes  is  possible  by  using  W 
(W  <  N)  wavelengths  and  W  x  W  AWGRs.  Meanwhile,  ring  FSR  should  be  larger  than  N  x 
AWGR  channel  spacing  in  a  single  AWGR  configuration,  so  ring  resonance  only  aligns  with 
one  of  the  AWGR  passbands.  Assume  a  ring  resonator  with  a  5-um  radius  [19]  (approx  2.4- 
THz  FSR),  the  switch  can  easily  accommodate  32  (48  maximum)  wavelegnth  channels  with  a 
50-GHz  channel  spacing.  Based  on  the  above  analysis,  larger  scale  switch  can  be  built  from 
32  x  32  AWGRs  [6]  and  ring  resonators  with  5-um  radii. 

3.  Performance  study  of  the  proposed  interconnect 

We  develop  a  clock-cycle-accurate  architecture-level  simulator  and  simulate  the  proposed 
interconnect  switch  with  8  nodes  and  64  nodes.  For  simplicity,  we  only  consider  the  cases 
where  kt  and  kr  are  equal  and  compare  the  scenarios  with  and  without  the  presence  of  the 
virtual  output  queues  (VOQ).  Each  transmitter  is  only  responsible  for  N/kt  output  ports. 
Likewise,  each  receiver  only  takes  data  from  N/kr  input  ports.  We  define  m  =  N/kn  where  m  is 
the  number  of  wavelengths  in  a  contention  group  [11].  Contention  only  happens  when 
multiple  inputs  use  wavelengths  within  the  same  contention  group  to  reach  the  same  receiver. 
From  a  practical  point  of  view,  instead  of  port  dependent  wavelength  partition,  it  is  desirable 
to  have  the  same  wavelength  partition  for  the  DEMUX  rings  on  each  output  bus  waveguide. 
Recall  the  cyclic  wavelength  routing  characteristics  of  AWGR,  we  must  group  different  input 
ports  together  to  form  contention  groups  for  different  output  ports.  This  static  contention 
group  partitioning  method  may  degrade  the  throughput,  but  it  greatly  reduces  system 
complexity  [11].  Table  1  illustrated  the  mapping  between  the  receivers  and  input  nodes  for 
the  case  of  N  =  8,  kt  =  4  and  kr  =  4. 


Table  1.  The  mapping  between  the  input  nodes  and  receivers. 
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We  assume  uniform  random  traffic  with  a  packet  size  of  1024B  in  all  the  simulations.  The 
inter-arrival  time  between  two  packets  follows  Bernoulli  distribution.  The  line  rate  is  set  at  10 
Gb/s.  We  define  the  maximum  offered  load  as  N  x  line  rate,  which  does  not  change  with  kt 
and  kr.  For  each  node,  there  is  a  16-KB  input  buffer  for  each  transmitter.  There  is  a  10-ns 
guard  time  between  any  two  consecutive  transmissions  due  to  the  ring  reconfiguration  time 
[20].  Unlike  interconnect  switches  for  data  center  applications,  here  we  neglect  the 
propagation  time  from  the  nodes  to  the  AWGR  because  it  will  be  less  than  10  ps  (less  than  2 
mm  distance)  for  future  on-chip  multi-core  high  performance  computing  systems.  The  control 
plane  runs  a  2-GHz  clock  and  takes  three  clock  cycles  to  make  arbitration  for  each  request. 
The  control  plane  handles  all  the  contentions  based  on  round  robin  arbitration.  The  transmitter 
will  send  out  the  next  packet  right  after  it  gets  the  permission  while  the  contended  packet  will 
stay  in  the  input  buffer  and  wait  for  the  next  grant.  The  transmitter  will  send  a  new  request  to 
the  control  plane  immediately  after  a  failed  attempt  or  a  successful  packet  transmission.  The 
case  where  kt=  1  and  kr=  1  represents  the  simulation  for  the  electrical  switch  based  on  input 
queuing  (IQ)  crossbar  topology. 


Fig.  3.  Performance  study  on  (a)  end-to-end  latency,  (b)  throughput  and  (c)  packet  loss  rate  as 
functions  of  the  offered  load  for  uniform  random  traffic  distribution  on  proposed  architecture 
with  8  nodes.  Left:  no  VOQ,  right  with  VOQ. 

Figure  3  shows  the  performance  study  of  the  proposed  architecture  based  on  an  8  x  8 
AWGR  under  various  configurations.  We  investigate  the  end-to-end  latency,  throughput  and 
packet  loss  rate  as  functions  of  the  input  load.  Without  VOQ,  multiple-transmitter/receiver 
pairs  provide  significant  boost  in  performance  due  to  the  increased  statistical  multiplexing 
and  the  enhanced  instantaneous  rate  at  each  AWGR  inputs  and  outputs.  There  is  marginal 
improvement  in  end-to-end  latency  when  kt  and  kr  go  from  2  to  4  which  indicate  even  a 
moderate  increase  in  kt  and  kr  from  their  original  value  of  1  will  substantially  improve  the 
performance.  The  presence  of  the  VOQ  greatly  improves  the  system  performance  for  single¬ 
transmitter/receiver-pair  configuration  in  all  three  aspects,  but  system  equipped  with  multiple 
transmitter/receiver  pairs  still  performs  better  especially  in  terms  of  end-to-end  latency  at  high 
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(>90%)  input  load.  While  the  above  analysis  still  holds  for  the  switch  with  64  interconnecting 
nodes,  increasing  the  network  size  does  increase  the  end-to-end  latency,  but  the  influence  is 
insignificant  compared  to  the  change  of  kt  and  kr  as  shown  in  Fig.  4.  We  observe  zero  packet 
loss  and  100%  throughput  for  all  the  cases  where  kt  >  2and  kr>  2. 


Fig.  4.  Performance  study  on  (a)  end-to-end  latency,  (b)  throughput  and  (c)  packet  loss  rate  as 
functions  of  the  offered  load  for  uniform  random  traffic  distribution  on  proposed  architecture 
with  64  nodes.  Left:  no  VOQ,  right  with  VOQ. 

4.  Design  and  characterization  of  an  8  x  8  switch  prototype  with  kt  =  4  and  kr  =  4 


Table  2.  OpSIS-IME  Process  overview. 


Parameter/Device  Type 

Features 

Overview 

220  nm  thick  starting  Si,  2  pm  BOX 

Standard  and/or  high  resistivity  handle 

Photonics-only  process,  no  electronics  included 

Front-end 

Two  partial  etches,  one  full  etch  of  top  silicon 

Six  optical  implants  for  modulators  and  detectors 

100%  Ge  deposition  and  implanting 

Back-end 

Two  metal  levels,  no  planarization 

Deep  Si  Trench  for  edge  coupling 

Optical  library  devices 

Grating  couplers 

Low  loss  waveguides  (ridge  and  rib) 

High-speed  modulators  (reverse-biased  pn  junction) 

High-speed  Ge  waveguide  photodiodes 

The  main  device  consists  of  three  major  components,  the  AWGR,  the  ring  modulators  and  Ge 
photo-detectors  as  shown  in  Fig.  5(a).  The  total  device  area  is  approx  1.2  mm  by  2.4  mm. 
Figure  5(d)  shows  a  photo  of  the  fabricated  device.  We  do  not  include  the  cyclic  frequency 
feature  in  the  8x8  AWGR  design  for  this  particular  proof-of-concept  demonstration  to  save 
device  area  [16].  Other  components  such  as  ring  modulators,  Ge  PDs  and  edge  couplers  are 
from  the  device  library  provided  by  OpSIS-IME  [15].  Detailed  design  parameters  for  those 
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devices  are  not  included  here.  Unlike  the  configuration  in  Fig.  2,  there  are  no  waveguides  on 
top  of  the  ring  modulators  here  as  shown  in  Fig.  5(b).  Instead,  we  couple  external  light 
sources  into  the  input  bus  waveguides  through  the  edge  couplers  directly,  for  simplicity.  On 
the  output  side,  as  shown  in  Fig.  5(c),  the  DEMUX  rings  select  the  modulated  signals  from 
the  output  bus  waveguides  and  couples  them  into  the  corresponding  Ge  PDs.  The  nano-taper 
edge  couplers  on  the  input/output  bus  waveguides  ensure  efficient  light  coupling  between  the 
chip  and  single  mode  lens  fibers.  After  the  mask  layout  preparation,  we  use  foundry  service 
offered  by  OpSIS-IME,  which  runs  in  a  standard  microelectronics  facility.  Table  2  shows  the 
overview  of  the  process  offered  by  OpSIS-IME. 


(b) 


IQ  Q  Q  Q 


output  bus  waveguide  (c) 


Nanotapeli 

tlr-aajwjLj 


Input  bus  waveguide 


Fig.  5.  (a)  Device  mask  layout  showing  only  the  waveguide  layer  and  the  metal  layer  (b)  rings 
on  one  of  the  input  bus  waveguides,  (c)  rings  and  PDs  on  one  of  the  output  bus  waveguides, 
(d)  Photo  of  fabricated  device. 
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The  proposed  switch  can  potentially  be  very  compact  when  compared  to  the  state-of-the- 
art  electronic  switch  such  as  Mellanox  SwitchX-2  switching  IC,  which  supports  up  to  64  10- 
GbE  ports  and  use  a  45  mm  x  45  mm  (2025  mm2)  FCBGA  package.  We  have  demonstrated  a 
low-loss  40-port  100-GHz  AWG  with  3-dB  channel  bandwidth  larger  than  10  GHz  on  the 
same  SOI  platform  as  in  [5].  Using  an  example  of  the  proposed  switch  based  on  the  40  x  40 
AWGR  for  comparison  based  on  the  recently  realized  40-port  AWG  [5],  we  estimate  as 
follows.  The  40  x  40  AWGR  takes  approximately  2.0  mm  x  2.5  mm,  or  5.0  mm2  [5].  The 
footprints  of  the  ring  and  PD  are  0.04  mm2  and  0.03  mm2,  respectively.  Assuming  we  have  2 
TX/RX  ring  pairs  on  each  connecting  node  (kt  =  2,  h,  =  2)  and  one  PD  for  each  RX  ring,  we 
have  160  rings  and  80  PDs  in  total  which  occupies  0.04  mm2  x  160  +  0.03  mm2  x  80  =  8.8 
mm2  device  area.  The  total  area  needed  for  the  optical/optoelectronic  devices  is  approx  5.0 
mm2  +  8.8  mm2  =13.8  mm2.  Other  than  the  high  speed  buffers  for  the  optical  ring  modulators 
and  GE  PDs,  which  are  shared  by  both  switches,  the  proposed  switch  requires  no  other  high 
speed  electronics  leaving  sufficient  space  for  the  remaining  electronics  which  work  at  a  much 
lower  speed  than  the  line  rate.  Moreover,  the  vertical  integration  of  electronic  components 
with  optical  components  can  further  reduce  the  overall  footprint  as  illustrated  in  [21]. 

Figure  6  shows  the  measured  transmission  spectra  of  the  8  x  8  200-GHz- spaced  AWGR, 
including  fiber  to  chip  coupling  loss  (approx  4  dB  per  facet)  and  waveguide  propagation  loss 
(approx  2.5  dB/cm).  The  estimated  insertion  loss  introduced  by  the  AWGR  itself  is  approx  9 
dB.  The  channel  cross  talk  is  approx  -13  dB,  which  can  be  further  improved  by  using  special 
waveguide  design  in  the  arrayed  arms  [22,  23]  or  including  phase  error  compensating 
elements  in  the  AWG  design  [24,  25].  Figure  7(b)  and  7(c)  shows  the  mask  layout  of  the 
depletion-mode  ring  modulators  and  Ge  photo-detector  from  OpSIS-IME's  library.  The 
measurement  results  from  the  testing  structures  show  those  devices  can  support  data 
modulation  at  10  Gb/s,  which  are  consistent  with  the  data  provided  by  OpSIS-IME.  With  a 
30- pm  radius,  the  ring  FSR  is  approx  400  GHz.  The  resonance  tunability  of  the  ring  is  limited 
to  approx  6  pm/V.  For  this  demonstration,  we  rely  on  the  carrier  injection  by  an  optical  pump 
[26]  at  1064  nm  from  above  to  align  the  ring  resonances  with  the  AWGR  passbands.  Due  to 
the  relative  small  ring  FSR,  ring  resonances  can  overlap  with  multiple  AWGR  passbands  and 
induce  crosstalk.  OpSIS-IME  now  provides  ring  resonators  with  FSR  up  to  1600  THz  (8-um 
ring  radius)  and  resonance  tunability  larger  than  one  FSR  (by  electrical  carrier  injection  in  a 
p-i-n  junction  [15]).  This  new  design  eliminates  the  crosstalk  issue  in  the  current  design  and 
greatly  simplifies  the  resonance  tuning  process.  Note  that,  with  only  one  waveguide  coupled 
with  each  ring  as  in  Fig.  5(b)  and  Fig.  7(a),  the  input  ring  resonators  are  in  all-pass-filter 
(APF)  configurations.  This  deviation  from  the  original  OpSIS-IME  design  as  in  Fig.  7(b) 
degrades  the  ring  on-off  extinction  ratio  under  reverse  bias.  We  use  carrier  injection  method 
instead  for  data  modulation  on  the  input  rings.  Figure  6(b)  illustrates  the  static  transmission 
spectra  of  a  ring  in  APF  configuration  under  various  forward  bias  voltages.  We  observe  a  8- 
dB  extinction  ratio  for  data  rate  up  to  0.3  Gb/s  which  is  typical  for  ring  modulator  in  forward 
carrier  injection  mode  without  pre-emphasis  on  the  electrical  driving  signal  [19].  We  will 
follow  OpSIS-IME’s  original  add-drop  configuration  for  future  designs  in  order  to  guarantee 
high-speed  performance. 
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Fig.  6.  Measured  transmission  spectra  of  (a)  the  8x8  AWGR  (b)  the  ring  resonator  under 
different  forward  bias. 


Fig.  7.  Layout  of  the  ring  modulator  (a)  in  all-pass-filter  configuration  (without  top 
waveguide),  (b)  in  add-drop  configuration  (with  top  waveguide)  and  (c)  Germanium  photo¬ 
detector  (only  waveguide  and  metal  layers  are  shown). 


5.  Routing  experimental  setup  and  results 


[diode 


Fig.  8.  Experimental  setup  for  the  routing  demonstration  on  the  fabricated  chip.  BPF:  optical 
band-pass  fiber;  ATT:  optical  attenuator;  PPG:  pulsed  pattern  generator. 

Figure  8  illustrates  the  experimental  setup  for  the  proof-of-principle  routing  demonstration  on 
the  fabricated  chip.  The  output  of  a  tunable  light  source  enters  the  switch  from  one  of  the 
input  waveguides  after  amplitude  and  polarization  adjustment.  The  coupling  loss  from  the 
lens  fiber  to  the  chip  is  approx  4.0  dB  when  measured  using  the  straight  waveguide  test 
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structures  on  the  same  chip.  We  inject  the  1064-nm  light  from  two  cleaved  fibers  into  the 
input/output  ring  pairs  from  above  to  align  their  resonances  with  the  input  wavelength  and 
one  of  the  AWGR  passbands.  A  Bias  Tee  combines  the  high-speed  signal  from  a  Pulse 
Pattern  Generator  (PPG)  with  the  DC  bias  voltage  for  driving  the  input  ring  modulator,  while 
we  apply  only  a  DC  bias  voltage  on  the  output  ring.  For  future  design  with  OpSIS-IME’s 
electrically  tunable  rings,  high  speed  (>100-Msps  update  rate)  Digital-to- Analog  Converters 
(DACs)  will  provide  the  current  required  to  rapidly  configure  the  ring  resonance  position. 
With  proper  alignment  between  the  ring  resonance  and  AWG  passband,  the  modulated  light 
eventually  enters  the  corresponding  photo-detector.  A  second  Bias  Tee  provides  a  1-V  reverse 
bias  for  the  PD  and  extracts  the  O/E  converted  high-speed  signal  for  eye  diagram  and  Bit 
Error  Rate  (BER)  measurements.  With  the  output  ring  misaligned,  a  lens  fiber  collects  the 
optical  signal  from  one  of  output  waveguides  and  sends  the  signal  to  a  scope  or  Optical 
Spectrum  Analyzer  (OSA)  for  monitoring  purposes. 
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Fig.  9.  (a)  Experimental  configuration  for  4-by-l  routing  demonstration  (b)  measured  BER. 


Power  into  the  AWGR  input  port  (dBm) 
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Fig.  10.  (a)  Experimental  configuration  for  l-by-4  routing  demonstration  (b)  measured  BER. 

Figure  9  and  Fig.  10  show  the  selected  input/output  ring  pairs,  eye  diagrams  and 
corresponding  BER  measurement  for  the  l-by-4  and  4-by-l  routing  demonstration  at  0.3 
Gb/s,  respectively.  The  PPG  drives  the  selected  input  ring  through  a  Bias  Tee  using  a  27-l 
PRBS  sequence.  Due  to  limited  resources  available  at  the  time  of  measurement,  we  only 
activate  one  input/output  ring  pair  for  each  BER  measurement.  As  a  result,  penalty  due  to 
intermodulation  crosstalk  between  multiple  active  ring  modulators  is  not  available  at  this 
time.  Resonance  adjustment  on  the  other  rings  sharing  the  same  bus  waveguide  is  not 
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necessary  since  their  resonance  positions  are  far  away  from  any  of  the  AWGR  passbands. 
BER  measurement  shows  less  than  1-dB  power  penalty  between  the  different  input/output 
ring  pairs,  but  more  than  4-dBm  optical  power  is  required  to  achieve  BER  below  IE- 10  at  0.3 
Gb/s.  Note  that  the  optical  power  level  is  measured  before  the  input  lens  fiber,  so  it  includes 
the  fiber-to-chip  coupling  loss  (~4dB),  AWGR  insertion  loss  (~9dB),  waveguide  propagation 
loss  and  ring  coupling  loss. 
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Fig.  11.  (a)  AWGR  transmission  spectra  over  three  FSRs,  (b)  experimental  configuration  for 
transmission  using  multiple  FSRs,  (c)  measured  BER. 

Figure  11(a)  illustrates  the  measured  AWGR  passbands  across  the  three  FSRs  between 
1520  nm  and  1570  nm  on  the  fabricated  device  using  a  broadband  ASE  from  an  EDFA  and 
the  OSA.  The  average  channel  crosstalk  is  approx  -13dB  before  phase  error  correction.  Using 
the  input/output  ring  pairs  as  show  in  Figure.  1 1(b),  we  measure  the  BER  performance  of  data 
transmissions  on  three  wavelengths  (1521.9  nm,  1540.9  nm  and  1560.6  nm)  at  0.3  Gb/s.  We 
observe  no  power  penalties  as  shown  in  Figure.  11(c).  Future  high  performance  computing 
systems  may  benefit  from  parallel-bit  transmission  by  utilizing  multiple  AWGR  FSRs. 

6.  Conclusion 

We  propose  a  scalable  chip-scale  optical  interconnect  switch  for  HPC  systems  by  leveraging 
unique  wavelength  routing  characteristics  of  the  AWGR,  and  compact  and  cost  effective 
device  offered  by  CMOS -compatible  silicon  photonic  integration.  We  present  a 
comprehensive  performance  evaluation  for  the  proposed  switch,  showing  low  end-to-end 
latency  and  high-throughput  switching  without  packet  loss  even  at  very  high  (>95%)  input 
load  .We  show  the  performance  is  impressive  even  with  small  number  of  input/output  ring 
pairs  (kt  =  2  and  kr  =  2)  on  each  node.  Furthermore,  we  prove  the  feasibility  of  the  proposed 
architecture  by  developing  a  prototype  using  silicon  photonic  integration  technology  on  a  SOI 
platform.  We  demonstrate  successfully  wavelength  routing  functions  on  the  fabricated  device. 
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The  development  of  a  more  advanced  interconnect  switch  chip  with  better  ring  controllability 
and  larger  throughput  is  now  in  progress. 
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